Skip to main content

Advanced logistic regression analysis

The example illustrates how to proceed in order to analyse the probability of getting a job and earning over NOK 500,000 one year after one is in a state without a job. The age group we are looking at is 16-60.

The analysis checks for various demographic characteristics as well as status on the labour market (unemployed, ordinary labour market measures, vocationally disabled, other jobseeker conditions, as well as work disability).

Some descriptive statistics are first created, and finally a logit analysis including marginal effects is run (the option mfx(dydx) used for this).

 //Connect to database
require no.ssb.fdb:33 as db

//Create population of persons 16-60 år without a job in November 2018, and resident in Norway per 1. Januar 2019
create-dataset demographydata
import db/BEFOLKNING_FOEDSELS_AAR_MND as birth_year_month
import db/BEFOLKNING_STATUSKODE 2019-01-01 as regstat
import db/REGSYS_ARB_ARBMARK_STATUS 2018-11-16 as labourstat

generate age = 2018 - int(birth_year_month / 100)
generate job = inlist(labourstat,'1','2')
keep if inrange(age,16,60) & regstat == '1' & job == 0

histogram age, discrete

//Import relevant variables (demography data are mostly measured per 1/1 each year)
import db/BEFOLKNING_KJOENN as gender
import db/BEFOLKNING_INVKAT as imm_cat
import db/SIVSTANDFDT_SIVSTAND 2018-11-16 as civstat
import db/BEFOLKNING_BARN_I_REGSTAT_FAMNR 2019-01-01 as children
import db/NUDB_BU 2018-11-16 as edu
import db/NUDB_SOSBAK as social_background
import db/BEFOLKNING_KOMMNR_FAKTISK 2019-01-01 as municipality
import db/ARBSOEK2001FDT_HOVED 2018-11-16 as work_seeker_stat
import db/UFOERP2011FDT_GRAD 2018-11-16 as disability_level
import db/INNTEKT_BRUTTOFORM 2018-12-31 as wealth
import db/INNTEKT_WYRKINNT 2019-12-31 as work_income19

//Create a dependent variable with two outcomes (dummy variable): High work income vs. low work income
histogram work_income19, width(100000) freq
summarize work_income19
generate high_income = work_income19 > 500000
piechart high_income

//Adapt the independent variables so that they suit the statistical model (most of them need to be tranformed into dummy variables)
generate male = gender == '1'
piechart male

destring civstat
generate married = civstat == 2
piechart married

generate immigrant = imm_cat == 'B'
piechart immigrant

tabulate children, missing
generate child = children == 1

generate more_children = children > 1

destring edu
generate high_edu = inrange(edu,700000,899999)
piechart high_edu

generate high_edu_parents = social_background == '1'
piechart high_edu_parents

generate oslo = municipality == '0301'
generate bergen = municipality == '1201'
generate stavanger = municipality == '1103'
generate trondheim = municipality == '5001'
barchart(sum) oslo bergen stavanger trondheim

destring work_seeker_stat
tabulate work_seeker_stat, missing

generate unempl = work_seeker_stat == 1
generate measure = work_seeker_stat == 3
generate voc_disabled = work_seeker_stat == 5 | work_seeker_stat >= 10
generate other_workseekers = work_seeker_stat == 2 | work_seeker_stat == 4 | work_seeker_stat == 7 

generate disabled = !sysmiss(disability_level) 

barchart(sum) unempl measure voc_disabled other_workseekers disabled

histogram wealth, width(100000) freq
summarize wealth
generate wealth_high = wealth > 1000000
piechart wealth_high

//Use sankey diagram to show transitions between states
sankey work_seeker_stat high_income
sankey high_edu high_income

//Run logit analysis where the dependent variable is allways listed first (needs to be dummy)
logit high_income male married age immigrant child more_children high_edu high_edu_parents oslo bergen stavanger trondheim unempl measure voc_disabled other_workseekers disabled wealth_high, mfx(dydx)